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ABSTRACT 

Apolipoprotein B mRNA-editing, enzyme-catalytic, 
polypeptide-like 3G (i.e., APOBEC3G or A3G) is an 
evolutionarily conserved cytosine deaminase that 
potently restricts human immunodeficiency virus 
type 1 (HIV-1), retrotransposons and other viruses. 
A3G has a nucleotide target site specificity for 
cytosine dinucleotides, though only certain 
cytosine dinucleotides are 'hotspots' for cytosine 
deamination, and others experience little or no 
editing by A3G. The factors that define these 
critical A3G hotspots are not fully understood. To 
investigate how A3G hotspots are defined, we 
used an in vitro fluorescence resonance energy 
transfer-based oligonucleotide assay to probe 
the site specificity of A3G. Our findings strongly 
suggest that the target single-stranded DNA 
(ssDNA) secondary structure as well as the bases 
directly 3' and 5' of the cytosine dinucleotide are 
critically important A3G recognition. For instance, 
A3G cannot readily deaminate a cytosine dinucleo- 
tide in ssDNA stem structures or in nucleotide base 
loops composed of three bases. Single-stranded 
nucleotide loops up to seven bases in length were 
poor targets for A3G activity unless cytosine 
residues flanked the cytosine dinucleotide. 
Furthermore, we observed that A3G favors 
adenines, cytosines and thymines flanking the 
cytosine dinucleotide target in unstructured 
regions of ssDNA. Low cytosine deaminase activity 
was detected when guanines flanked the cytosine 
dinucleotide. Taken together, our findings provide 
the first demonstration that A3G cytosine deamin- 
ation hotspots are defined by both the sequence 
context of the cytosine dinucleotide target as well 



as the ssDNA secondary structure. This knowledge 
can be used to better trace the origins of mutations 
to A3G activity, and illuminate its impact on 
processes such as HIV-1 genetic variation. 

INTRODUCTION 

Apolipoprotein B mRNA-editing, enzyme-catalytic, poly- 
peptide-like 3G (APOBEC3G or A3G) is an important 
host restriction factor that can inhibit human immuno- 
deficiency virus type 1 (HIV-1) and other viruses via 
cytosine deamination of viral genomic DNA (1). In the 
presence of the HIV-1 Vif protein, the activity of 
APOBEC3G is attenuated and the residual deamination 
activity of A3G may contribute to the high mutation rate 
of HIV-1, virus evolution and antiretro viral drug resist- 
ance (2-4). However, when the activity of Vif is 
moderated or extinguished, A3G highly restricts viral 
replication (1,5,6). This restriction largely results from 
the high level of deamination during HIV-1 reverse tran- 
scription, which can lead to degradation of DNA with 
abasic residues, a decrease in the specificity of plus-strand 
initiation, and accumulation of lethal G-to-A mutations 
on the plus strand (i.e., hypermutation) (7-9). Although 
it is known that A3G acts exclusively on single-stranded 
DNA (ssDNA) and acts preferentially at specific sites in 
a sequence of DNA, termed 'hotspots,' the factors that 
create these critical restriction hotspots are not fully 
understood (10,11). A3G requires a cytosine dinucleotide 
context on ssDNA and studies have shown that A3G 
tends to favor CCCA or T/CCC sequences (12,13). 
Distinct restriction hotspots on the viral genome often 
occur outside this four base context, however, as A3G 
can deaminate at a variety of other sites (14). 
Interestingly, many CCCA or T/CCC sites are not 
edited by A3G (3). 

As indicated above, published observations to date 
indicate that hotspot specificity for A3G must be 
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determined by more than the bases immediately 3' and 5' 
of the required cytosine dinucleotide sequence. Specific 
sequences far upstream or downstream from a hotspot 
also cannot be necessary for recognition, since A3G can 
deaminate oligos as small as 16 or 13 nt efficiently in vitro 
(15,16). However, it is formally possible that certain 
distal sequences could play some role in large ssDNAs 
in vivo. Given that cytosine residues in small oligos can be 
efficiently deaminated in a variety sequence contexts, 
some other feature of ssDNA in cells is likely protecting 
otherwise favorable sites from deamination. For example, 
DNA-binding proteins could make some deamination 
sites in accessible to A3G. In the case of an HIV-1 infec- 
tion, the HIV-1 nucleocapsid (NC) protein is a known 
DNA-binding protein that has important functions in 
the HIV-1 life cycle (17-22). However, HIV-1 NC and 
A3G binding is non-competitive on target oligonucleo- 
tides, suggesting that NC protein may not prevent 
access of A3G to a particular target site, and actually 
could enhance A3G binding (16). Another possibility is 
that certain HIV-1 ssDNA regions may be protected 
from A3G due to secondary structure folding that 
occurs during the reverse transcription process. A3G 
does not act on dsDNA templates, and therefore 
ssDNA secondary structure (e.g., stem structures) could 
act as an accessibility barrier for A3G (23). Also, cytosine 
bases in small loop structures may also be inaccessible 
targets, particularly if the proper contacts between 
enzyme and substrate are no longer in alignment due to 
physical constraints. 

In this study, we have investigated the nature of A3G 
target sites, in particular the impact of nucleotide bases 
adjacent to the cytosine dinucleotide target as well as the 
influence of secondary structure in ssDNAs. Here we 
demonstrate that by systematic nucleotide base changes 
on either side of the cytosine dinucleotide target that 
certain bases are preferred by A3G in order to be 
optimal targets for cytosine deamination. We also 
observed that DNA stems represent poor targets for 
A3G, and can protect an otherwise desirable target 
sequence from cytosine deamination. Small loop struc- 
tures can also protect potential A3G target sequences 
from cytosine deamination. Taken together, our 
findings provide the first demonstration that A3G 
cytosine deamination hotspots are defined by both the 
sequence context of the cytosine dinucleotide target as 
well as the ssDNA secondary structure. Such observa- 
tions provide further information for predicting the loca- 
tions of cytosine deamination by A3G, which is of 
particular importance in tracing the origins of HIV-1 
genetic variation in vivo. 

MATERIALS AND METHODS 

Preparation of cell lysates 

A cell line stably expressing A3G, 293-A3G clonelO (3) 
was cultured in DMEM supplemented with 10% 
FetalClone3 (FC3, Hyclone) serum with 1% penicillin/ 
streptomycin (Invitrogen), and 225 ug/ml neomycin 
(Invitrogen). The parental 293 cells were maintained in 



DMEM with 10% FC3, 1% penicillin/streptomycin and 
225 ^ig/ml neomycin. Cells were incubated at 37°C in 5% 
C0 2 . Cell lysates for in vitro assays of A3G activity were 
prepared as previously described (15). Briefly, 5 x 10 6 cells 
(293 or 293-A3G clone 10) were centrifuged at 1000 rpm 
for 5 min. Cells were resuspended in 250 ul of lysis buffer 
(0.626% NP40, 10 mM tris-acetate pH 7.4, 50 mM potas- 
sium acetate, 100 mM NaCl and lOmM EDTA) and 50 ul 
of protease inhibitor cocktail (Cat. # P2714-1BTL, Sigma- 
Aldrich), and incubated on ice for 15 min. Cell lysates 
were centrifuged at 1000 rpm for 2 min, and the super- 
natants transferred to a pre-chilled tube and centrifuged 
at 16 000 rpm for 10 min. The clarified cell supernatants 
were then transferred to a new pre-chilled tube and 
stored at — 80°C prior to use. 

Oligonucleotide design and synthesis 

ssDNA oligonucleotide secondary structures were pre- 
dicted by using mFold with the default DNA settings 
(http://mfold.rna.albany.edu/7q = mfold/DNA-Folding- 
Form) (24). The oligonucleotides selected for synthesis 
had only a single predicted structure and were synthesized 
dual-labeled with TAMRA and FAM fluorophores 
(Sigma-Aldrich). Table 1 indicates the oligonucleotides 
used in this study. 

A3G FRET assay 

A fluorescence resonance energy transfer (FRET) based 
assay was used to detect cytosine deaminase activity of 
A3G using DNA oligonucleotides as a substrate using 
previously described assays with minor modifications 
(15,25). Cell lysates were diluted 3:2 in lysis buffer, and 
20 ul of the diluted lysates were used per assay using 96 
white-welled assay plates (Bio-Rad). A separate solution 
of 20 pmoles of oligonucleotide, 10 ug RNase A and 0.04 
U uracil DNA glycosylase (UDG) were mixed together 
in 50 mM Tris pH 7.4, 10 mM EDTA buffer and 
adjusted to a total volume of 50 ul, then transferred to 
the assay well. The assay plate was then incubated at 
37°C for 5h. Next, 30 ul of 2M Tris-acetate, pH 7.9 
was added to each well and the plate was incubated at 
95°C for 2 min and at 4°C for 2.5 min with a CFX96 
real-time PCR system (Bio-Rad). The fluorescence was 
then measured at 4°C. The endpoint fluorescence from 
the parental 293 cell lysate was subtracted from all ex- 
perimental samples in order to calculate a relative change 
in fluorescence due to A3G activity. Experiments were 
conducted with three independent replicates. For assays 
involving HIV-1 NC protein (purified NC protein gra- 
ciously provided by Dr Rob Gorelick, SAIC, Frederick, 
MD), experiments were conducted in the presence of 
HIV-1 NC protein (5nt per NC) for lh on ice. 
Following incubation, cell lysates were prepared as 
described above, added to each well for 5h at 37°C, 
and then fluorescence measured. 

Restriction enzyme FRET assay 

To further validate the predicted structures of the FRET 
oligos, select oligos were digested with restriction 
enzymes. Regions of the oligo in a stem secondary 



Nucleic Acids Research, 2013, Vol. 41, No. 12 6141 



Table 1. Oligonucleotide sequences used in the analysis of the influence of nucleotide sequence and ssDNA secondary structure on the in vitro 
activity of APOBEC3G 



Oligonucleotide Sequence 5'-3' 



Oligonucleotide Sequence 5'-3' 



AccA Open Set 1 
AG = -1.90 
CccC Open Set 1 
AG = -1.90 
TccT Open Set 1 
AG = -1.90 
GccG Open Set 1 
AG = -1.90 
AccA Stem Set 1 
AG = -3.77 
CccC Stem Set 1 
AG = -7.37 
TccT Stem Set 1 
AG = -3.30 
GccG Stem Set 1 
AG = -4.81 
AccA Open Set 2 
AG = -2.36 
CccC Open Set 2 
AG = -3.35 
TccT Open Set 2 
AG = -2.23 
GccG Open Set 2 
AG = -3.47 
AccA Stem Set 2 
AG = -7.18 
CccC Stem Set 2 
AG = -9.31 
TccT Stem Set 2 
AG = -6.36 
GccG Stem Set 2 
AG = -9.96 
GccA Open 
AG = -4.42 
GccT Open 
AG = -3.94 
GccC Open 
AG = -2.51 
GccC Stem 
AG = -8.39 
AccG Open 
AG = -3.47 
AccT Open 
AG = -1.46 
CccC Bulge 
AG = -5.30 
dU Bulge 
AG = -5.30 



ATTGAACCAGAATGATGTCATTGAATATG 

ATTGACCCCGAATGATGTCATTGAATATG 

ATTGATCCTGAATGATGTCATTGAATATG 

ATTGAGCCGGAATGATGTCATTGAATATG 

ATTGAACCAGAATGATGTCTGGGAATATG 

ATTGACCCCGAATGATGTCGGGGAATATG 

ATTGATCCTGAATGATGTCAGGTAATATG 

ATTGAGCCGGAATGATGTCCGGGAATATG 

AAACCACGAGAGAGATCGTACTAAG 

AACCCCCGAGAGAGATCGGATGAAG 

AATCCTCGAGAGATATCTATAAAAG 

AAGCCGCGAGAGAGATCGCATCAAG 

AAACCACGAGAGAGATCGTGGTAAG 

AACCCCCGAGAGAGATCGGGGGAAG 

AATCCTCGAGAGAGATCGAGGAAAG 

AAGCCGCGAGAGAGATCGCGGCAAG 

AAGCCACAAGAGAGATCTTGCTAAG 

AAGCCTAAAGAGAGATCTTTGAAAG 

AAGCCCGAAGAGAGATTCGAATAAG 

AAGCCCGAAGAGAGATTCGGGCAAG 

AAACCGCGAGAGAGATCGCACTAAG 

AAACCTCGAGACAGATCGAACTAAG 

AATGAAGCCCCGAGCAACTCGGCTTCTATG 

AATGAAGCCUCGAGCAACTCGGCTTCTATG 



AccC Open 
AG = -3.35 
TccA Open 
AG = -2.36 
TccA Stem 
AG = -6.67 
TccC Open 
AG = -3.35 
TccG Open 
AG = -3.47 
3 Loop AccA 
AG = -3.85 

3 Loop CccC 
AG = -4.84 

4 Loop AccA 
AG = -4.60 

4 Loop CccC 
AG = -4.10 

5 Loop AccA 
AG = -4.60 

5 Loop CccC 
AG = -4.40 

6 Loop AccA 
AG = -3.90 

6 Loop CccC 
AG = -3.90 

7 Loop CccC 
AG = -3.50 

7 Loop AccA 
AG = -3.50 

8 Loop AccA 
AG = -3.70 

8 Loop CccC 
AG = -3.70 
8 Loop TccT 
AG = -3.70 

8 Loop GccG 
AG = -3.70 

9 Loop AccA 
AG = -3.70 

10 Loop AccA 
AG = -3.70 
dU Open 

AG = -2.67 
dU Stem 
AG = -1.72 
dU 3 Loop 
AG = -3.85 



AAACCCCGAGAGAGATCGGACTAAG 

AATCCACGAGACAGATCGTACTAAG 

AATCCACGAGACAGATCGTGGAAAG 

AATCCCCGAGAGAGATCGGACTAAG 

AATCCGCGAGAGAGATCGCATCAAG 

ATTGATGCTGACCATCAGCTAATATG 

ATTGATGCTGCCCCGCAGCTAATATG 

ATTGATGCTGAACCATCAGCTAATATG 

ATTGATGCTGACCCCTCAGCTAATATG 

ATTGATGCTGAACCAGTCAGCTAATATG 

ATTGATGCTGAACCCCTCAGCTAATATG 

ATTGATGCTGAAACCAGTCAGCTAATATG 

ATTGATGCTGAACCCCGTCAGCTAATATG 

ATTGATGCTGAGACCCCGTCAGCTAATATG 

ATTGATGCTGAGAACCAGTCAGCTAATATG 

ATTGATGCTGAGAACCAGATCAGCTAATATG 

ATTGATGCTGAGACCCCGATCAGCTAATATG 

ATTGATGCTGAGATCCTGATCAGCTAATATG 

ATTGATGCTGAGAGCCGGATCAGCTAATATG 

ATTGATGCTGAGAACCAAGATCAGCTAATATG 

ATTGATGCTGACGAACCAAGATCAGCTAATATG 

AAACUACGAGAGAGATCGTGCTAAG 

ATTGATCCUTGAATGATGTCAGGGGATATG 

ATTGATGCTGACUATCAGCTAATATG 



structure would be cut by the specific restriction enzyme 
and would release FRET signal. Briefly, 20 pmoles of 
GccG set 2 open or stem oligo and 5U of Aci I (NEB) 
were added to a solution of lx Buffer 3 (NEB) in 100 ul 
total volume. For GccG set 1 open or stem oligo, 20 
pmoles were added with 2U of Mspl (NEB) to a 
solution of lx Buffer 4(NEB) in 100 ul total volume. 
Mixes were added to white-welled 96-well plates 
(Biorad) and incubated at 37°C for 30min in a C1000 
thermal cycler with a CFX96 real-time system (Biorad). 
Subsequently, the temperature was adjusted to 4°C for 
30 s and the plate was read. Experiments were conducted 
with three independent replicates. 



RESULTS 

Sequence context as well as ssDNA secondary structure 
define A3G cytosine deaminase hotspots 

Two sets of ssDNA oligonucleotides were initially tested 
in which the cytosine dinucleotide was located either in an 
open (unstructured) location or was located within a 
structured stem; representative examples of the oligo- 
nucleotide structures are shown in Figure 1A. The oligo- 
nucleotide design strategy helped to minimize sequence 
changes between the oligonucleotide pairs and distal 
to the deamination site. However, some additional nucleo- 
tide changes were required for some of the 
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Figure 1. Nucleotide sequence context and ssDNA secondary structure help to define A3G cytosine deaminase hotspots. (A) Oligonucleotides 
containing the cytosine dinucleotide targeted by A3G dual-labeled with TAMRA and FAM fluorophores. The red colored 'CC dinucleotide 
bases represent the A3G target site. The blue colored 'X' bases represent the positions at which nucleotide bases were changed. The 'open' oligo- 
nucleotides are defined as the oligonucleotides in which the target cytosine dinucleotide is located in the unstructured region of the ssDNA, and the 
'stem' oligonucleotides are defined as the oligonucleotides in which the target cytosine dinucleotide is located within the stem structure. (B) The 
change in relative fluorescence units (ARFU) was calculated for each experiment by subtracting the RFU from the control 293 cell lysates (baseline 
negative control) from the 293 cell lysates that stably express A3G. The error bars represent the standard deviation from three independent 
experiments. The positive control for these experiments was an oligonucleotide previously reported to be cleaved by A3G in an oligonucleotide- 
based FRET assay (15). (C) The ARFU was calculated as described above. The average and standard deviation from three independent experiments 
is shown. 



oligonucleotides tested in order for the CC dinucleotide to 
be in the correct structural location within the most stable 
structure (Table 1 and Supplementary Figure SI). In par- 
ticular, the CccC Stem Set 1 has an extra base pair in the 
stem, though the oligonucleotide sequence is consistent 
with what is shown in Figure 1A. For the TccT Open 
Set 2 oligonucleotide, the number of loop bases was 
reduced from 6 to 4 bases, and the number of bases 
involved in the stem decreased from 7 to 3 bases. 
Finally, for the 5 Loop CccC oligonucleotide, the 
bottom base in the '5 loop' in Figure 2A was changed to 
a C residue rather than a G residue (Figure 2A). The 



specific predicted mFold structures using the default 
settings for these three oligonucleotides are indicated in 
Supplementary Figure SI. Other oligonucleotides were 
tested in which bases on either side of the cytosine di- 
nucleotide, which represent the base locations that are 
most critical for A3G recognition (11). 

Cell lysates prepared from 293-expressing A3G cells or 
293 parental cells were used to incubate with each oligo- 
nucleotide along with UDG and RNase A as described in 
the Materials and Methods section. In the presence of 
A3G cytosine deaminase activity, creation of a uracil 
base would occur resulting in an abasic site following 
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Figure 2. A3G cytosine deaminase activity against a target cytosine dinucloetide is influenced by location in ssDNA base loops but not in a DNA 
bulge. (A) Oligonucleotides used to investigate the influence of ssDNA loop size on A3G activity are shown. The red colored 'CC dinucleotide bases 
represent the A3G target site. The blue colored 'X' bases represent the positions at which nucleotide bases were changed. (B) The change in relative 
fluorescence units (ARFU) was calculated for each experiment by subtracting the RFU from the control 293 cell lysates (baseline negative control) 
from the 293 cell lysates that stably express A3G. The .r-axis indicates the number of nucleotide bases in the ssDNA loop. The error bars represent 
the standard deviation from three independent experiments. 



uracil base excision by UDG. Base hydrolysis of the 
abasic site would release a FAM signal from the FRET 
pair. 

Figure IB demonstrates that the bases on either side 
of the cytosine dinucleotide are important for A3G 
activity when located in a non-structured region. In par- 
ticular, we observed that adenine, cytosine or thymine 
bases on either side of the cytosine dinucleotide increased 
A3G activity (P < 0.05) whereas guanine bases on either 
side of the dinucleotide had a reduced but significant effect 
(_P<0.05). These results indicate that adenine, cytosine 
or thymine bases on either side of the cytosine dinucleo- 
tide enhance A3G activity and guanine bases limit A3G 
activity. 

We further explored the nature of the nucleotide 
bases on either side of the cytosine dinucleotide by 
investigating the base preference on 5' or 3' side of the 
dinucleotide site (Figure 1C). In particular, when the 5' 



base was a guanine, there was little activity detected 
when the 3' base was an adenine or thymine, but a high 
A3G signal was observed when the 3' base was a cytosine. 
When the 5' base was an adenine, there was moderate 
A3G activity unless the 3' base was a cytosine. Finally, 
when the 5' base was a thymine, A3G activity was 
moderate to relatively high when the 3' base was either a 
guanine or an adenine, and activity was enhanced if the 3' 
base was a cytosine. 

Significantly reduced A3G activity was observed when 
cytosine dinucleotides were located within an oligonucleo- 
tide stem, indicating that A3G can have difficulty in 
accessing target bases located in regions in which second- 
ary structure exists, in any sequence context (Figure 1C). 
Taken together, these observations indicate that both the 
nucleotide base on either side of the cytosine dinucleotide 
as well as their location in secondary structure can define 
A3G hotspots. 
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Structural constraints in DNA loop bases can limit 
A3G hotspots 

Given our observation that ssDNA secondary structure 
can attenuate A3G activity, we further investigated how 
the location cytosine dinucleotides in ssDNA loop bases 
could impact A3G activity. To do this, we tested oligo- 
nucleotides in which the cytosine dinucleotide was in 
either a stem bulge or in a ssDNA loop that ranged 
from 3 bases in size up to 10 bases in size (Figure 2A). 
A3G activity was not affected by the cytosine dinucleotide 
located in a stem bulge, but had low activity when the 
dinucleotide was located in a 3 nt base loop where either 
adenine or cytosine was flanking the cytosine dinucleotide 
(Figure 2B). This indicates that 3 nt base loops can protect 
A3G hotspots. Interestingly, when cytosine bases flanked 
the cytosine dinucleotide, high A3G activity was observed 
within 4-8 nt base loops, but not when adenines flanked 
the cytosine dinucleotide. Low activity was detected with 
adenines flanked the cytosine dinucleotide in the 4 nt base 
loops, and moderate A3G activity detected when the 
cytosine dinucleotide was located in 5-7 nt base loops 
with adenine bases flanking (Figure 2B). Higher activity 
was observed in 8-10 base loops with adenines flanking. 
This indicates that nucleotide base loop structures can be 
protected cytosine dinucleotides when flanked by adenine 
bases in seven base or smaller loops. When cytosine bases 
flank the cytosine dinucleotide, protection is observed 
only in a 3nt base loop. Moderate or low A3G activity 
was observed with thymine or guanine bases flanking the 
cytosine dinucleotide, respectively. This observation com- 
plements the observations made in the absence of second- 
ary structure. 

Since, the binding of HIV-1 NC and A3G is non-com- 
petitive on target oligonucleotides, NC protein may not 
prevent access of A3G to a particular target site, and may 
enhance A3G binding (16). It is also formally possible that 
NC may protect certain HIV-1 ssDNA regions due to 
secondary structure folding that occurs during reverse 




Figure 3. No effect of HIV-1 NC protein on altering the efficiency of 
A3G deamination. The AccA set 2 open and stem oligonucleotides 
were incubated in the presence or absence of HIV-1 NC protein (con- 
centration of 5 nt per NC protein). The change in relative fluorescence 
units (ARFU) was calculated for each experiment by subtracting the 
RFU from the control 293 cell lysates (baseline negative control) from 
the 293 cell lysates that stably express A3G. Error bars represent the 
standard deviation from three independent experiments. 



transcription. Since A3G does not act on dsDNA tem- 
plates, ssDNA secondary structure (e.g., stem structures) 
could act as an accessibility barrier for A3G (23). It is also 
conceivable that cytosine bases in small loop structures 
may be inaccessible due to physical constraints. To test 
for potential affects of HIV-1 NC on A3G activity, we 
selected an oligonucleotide pair in which there was a 
clearly significant difference in the FRET signal 
observed when the CC dinucleotide target was located in 
either a non-structured or structured region (i.e., AccA set 
2 open and stem oligonucleotides; Table 1). A HIV-1 NC 
concentration (i.e., 5 nt per NC) was chosen that is physio- 
logically relevant based upon what is predicted in the virus 
particle (26,27). As indicated in Figure 3, the addition of 
NC was found to have no effect on A3G activity when the 
CC dinucleotide was either in the AccA set 2 open or stem 
oligonucleotide. 

It has been previously demonstrated that UDG excises 
uracil residues more efficiently from ssDNA than dsDNA 
(28), and that the excision of uracil from loops is ineffi- 
cient. In order to confirm that the FRET signal differences 
observed with the target cytosine base for cytosine de- 
amination by A3G is in a non-paired region, or in a 
stem, bulge or DNA loop is actually due to A3G 
activity and not to UDG, we synthesized oligonucleotides 
that contain uracil in these different locations. Figure 4 
shows that UDG is readily able to excise the uracil residue 
in each of these positions, indicating that the differences 
that we have observed are due to A3G activity and not 
due to UDG. 

Experimental confirmation of mFold ssDNA structural 
predictions 

The oligonucleotides that were used in this study were 
selected in part based upon their having a single struc- 
tural prediction in the mFold program (24). Specific par- 
ameters have been designed into the mFold program for 
ssDNA folding that were based upon NMR data 




Figure 4. UDG activity is undiminished on ssDNA secondary struc- 
tures. The effect of uracil location in oligonucleotides was investigated. 
Four different oligonucleotides were used in which the target cytosine 
was replaced with a uracil that was located in a non-paired, stem, bulge 
or DNA loop region. The relative fluorescent units (RFU) from a uracil 
in the open, stem, three base loop and bulge location in the presence of 
UDG is shown. The average and standard deviation from three inde- 
pendent experiments is shown. 
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generated with ssDNA sequences (29-34). In order to 
experimentally validate that these predictions for the 
oligonucleotides used in this study, a small subset were 
analysed that possessed DNA secondary structures that 
created restriction enzyme sites. Confirmation of the 
presence of these restriction sites would provide one 
line of experimental evidence in support of the structure 
predicted by the mFold program for oligonucleotide test. 
Oligo GccG set 1 stem, when folded, creates a Msp I 
restriction site that is not present in the non-folded 
version of the oligonucleotide. Figure 5a resulted in a 
strong FRET signal when oligonucleotide GccG set 1 
stem was incubated in the presence of Msp I, but not 
when the non-folding version of the oligonucleotide 
(i.e., GccG set 1 open) was incubated with Msp I. This 
data suggest that the predicted structure for the GccG set 
1 stem is correct. We conducted a similar analysis with 
GccG set 2 stem and GccG set 2 open, where the pre- 
dicted folded structure for GccG set 2 stem resulted in 
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Figure 5. Experimental confirmation of ssDNA secondary structures. 
Two sets of oligonucleotides with restriction enzyme sites in the stem 
oligonucleotide were used ((A) GCCG Set 2 Stem and GCCG Set 2 
Open, and (B) GCCG Set 1 Stem and GCCG Set 1 Open). The stem 
bases in the structured oligonucleotides create an Aci I restriction site 
(A) or a Msp I restriction site (B). The oligonucleotides GCCG Set 2 
Open and GCCG Set 1 Open did not fold to form the restriction 
enzyme sites and remained intact. The average and standard deviation 
from three independent experiments are shown. 



the creation of an Aci I restriction site, which does not 
occur in GccG set 2 open (Figure 5b). Incubation of each 
oligonucleotide with Aci I lead to a strong FRET signal 
only with the GccG set 2 stem oligonucleotide, which 
also suggests that the structural predictions by mFold 
for the oligonucleotides used in this study are correct. 
Although this data support the proposed intramolecular 
structural predictions, it is formally possible that stable 
structures could also arise using the set 1 stem or set 
2 stem oligonucleotides by the formation of intermolecu- 
lar homodimers. For instance, the restriction enzyme 
analysis conducted above with Msp I and Aci I would 
not be able differentiate per se between a single intramo- 
lecular stem versus that of a intermolecular homodimer 
stem — though stem formation of the participating nucleo- 
tide bases would be confirmed. 

DISCUSSION 

The goal of this study was to investigate the determinants 
for A3G hotspots. To do this, we used an experimental 
model system in which we used oligonucleotides that were 
dual-labeled with TAMRA and FAM nuorophores. 
Lysates from cells stably expressing A3G were incubated 
with these oligonucleotides, and A3G activity was 
detected by FRET. Oligonucleotides were designed to 
test (i) the role of nucleotide bases adjacent to the 
cytosine dinucleotide target site that is critical for 
cytosine deamination; and (ii) the role of ssDNA second- 
ary structure, including DNA duplexes, loop sequences 
and bulges. 

We observed that the ability of A3G to deaminate was 
found to be greatly dependent upon the nucleotide bases 
immediately adjacent to the cytosine dinucleotide. 
Specifically, A3G efficiently deaminates when cytosines, 
adenines or thymines are adjacent to the cytosine di- 
nucleotide. A3G activity was low when guanine bases 
were on either side of the cytosine dinucleotide. Previous 
studies have indicated that A3G prefers a sequence 
context of 5'-CCCA-3' or 5'-T/CCC-3' (12,13). This cor- 
responds well with the data in our study. In addition, 
a study has been reported in which 5'-TCCA-3', 
5'-ACCA-3', and 5'-ACCG-3' were found to be good sub- 
strates for A3G cytosine deaminase activity, whereas 
5'-GCCA-3' and 5'-ACCT-3' were found to have no or 
minimal activity, respectively (35). Although these 
studies support parts of our current study, our observa- 
tions represent a more extensive and complete study of the 
preferred bases, and therefore provides greater insight into 
being able to predict and identify A3G hotspots. 

The results from our studies are supported by studies 
investigating A3G hotspots identified in HIV-1 sequences 
recovered from infected individuals. Coffin and colleagues 
found that of the available sites for A3G-mediated 
cytosine deamination, 40% of 5'-CCCC-3' sequences, 
21% of 5'-ACCA-3' sequences, 11% of 5'-TCCT-3' se- 
quences and 0% of 5'-GCCG-3' sequences were A3G 
cytosine deamination sites, which is a striking correlation 
to our data (2). Furthermore, 5'-TCCA-3', 5'-TCCC-3' 
and 5'-ACCT-3' sequences were found be locations that 
the authors concluded that A3G-mediated cytosine 
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deamination had occurred, while no cytosine deamination 
occurred at 5'-GCCT/A-3' sequences. Finally, HIV-1 
proviral sequencing data from our previous studies of 
A3G-mediated cytosine deamination of HIV-1 in cell 
culture found G-to-A mutations (in the positive strand) 
at either 5'-GGGG-3', 5'-TGGT-3' or 5'-TGGA-3' se- 
quences and few or no mutations at either 5'-AGGA-3', 
5'-AGGT-3', 5'-TGGC-3' or 5'-CGGC-3' sequences 
(3,36). Taken together, these data indicate that the A3G 
hotspot sequence preferences in our cell-free study using 
oligonucleotides corresponds with strong predictive power 
to the A3G hotspot preferences observed from HIV-1 
proviral DNA sequencing. In addition, A3G mutational 
hot spot sites are clearly far more complex than the widely 
cited 5'-CCCA-3' or 5'-T/CCC-3' A3G nucleotide 
sequence preference. 

A recent study investigated the features of nucleotide 
bases that can help define the sequence preference of 
A3G (11). Exocyclic groups in pyrimidines that are 
located 1 or 2 nt 5' of the cytosine targeted by A3G 
were found to dictate substrate recognition. The exocyclic 
groups were speculated to be important for stacking or for 
electrostatic interactions among adjacent bases. When 
these interactions are disrupted, it was conjectured that 
it could affect the ability of A3G to recognize the sub- 
strate. This hypothesis is supported by our data with the 
sequence 5'-T/CCCG/A/C-3'. However, we observed that 
the sequence 5'-ACCA-3' can also be an efficient target for 
A3G-mediated cytosine deamination. It is of particular 
interest to these observations that local sequence context 
has been found to influence the scanning ability of A3G 
and that A3G has been proposed to 'hover' over 5'-ACCC 
A-3' sequences longer than 5'-TCCCT-3' sequences (37). 
This observation suggests that additional features of 
adenine bases may be important in the attraction of 
A3G to sites of cytosine deamination. 

We have also demonstrated in this study that ssDNA 
secondary structure plays a vital role in the identification 
of A3G hotspots. In particular, the data presented here 
indicate that A3G either has no or low activity deamin- 
ation activity for the cytosine dinucleotides located in 
ssDNA oligonucleotide stem structures or cytosine di- 
nucleotides located in three base loops. The low 
activity observed on stems may be due to base unzipping 
from the stem at a low frequency, which would expose 
the CC dinucleotide. Furthermore, only cytosine 
dinucleotides flanked by cytosines or adenines were effi- 
ciently deaminated when located in ssDNA loops up to 8 
bases in size or 8-10 bases respectively, whereas cytosine 
dinucleotides flanked by adenines resulted in only 
moderate level of A3G-mediated cytosine deamination 
in ssDNA oligonucleotide loops 5-7 bases. A related 
cytosine deaminase family member, AID, has also been 
reported to be inefficient in deaminating cytosines in 
ssDNA secondary structures involving stems and loops 
(38). Since A3G cannot deaminate dsDNA, we hypothe- 
size that ssDNA stems mimic dsDNA in an efficient 
enough of a manner in order to avoid cytosine deamin- 
ation. Furthermore, A3G has been previously 
demonstrated to be processive along ssDNA substrates 
by sliding and jumping (39,40). When A3G encounters 



partially dsDNA, the sliding ability was lost but the 
jumping ability was retained (39). Therefore, it is 
tempting to speculate that A3G could 'jump' over 
stems and loops — and this could help explain at least 
part of the reduced level of cytosine deamination in 
those regions. However, the ability of A3G to 'jump' 
over stems and loops does not account for the differ- 
ences observed with 5'-CCCC-3' and 5'-ACCA-3' se- 
quences in ssDNA loops. While the tortional bend of 
nucleotides in ssDNA loops may not allow for the 
proper contacts between A3G and the nucleotide bases, 
this may be more readily overcome with cytosines rather 
than adenines. Further studies are needed in order to 
determine the specific factors behind these observations. 
It will be important and beneficial to compare the 
sequence and secondary structural preferences of other 
APOBEC3 family members to our findings on 
APOBEC3G. The addition of HIV-1 NC protein was 
found in our experiments to have no effect on A3G 
activity when the CC dinucleotide was either in the 
AccA set 2 open or stem oligonucleotide. These obser- 
vations suggest that ssDNA secondary structure (e.g., 
stem structures) could act as an accessibility barrier for 
A3G, even in the presence of physiologically relevant 
concentrations of HIV-1 NC. It is presently unclear 
how generally applicable these observations are to 
other CC dinucleotide position locations. 

In summary, the observations made in this study 
provide the first demonstration that A3G cytosine de- 
amination hotspots are defined by both the sequence 
context of the cytosine dinucleotide target as well as the 
ssDNA secondary structure. These observations provide 
useful information for predicting the locations of cytosine 
deamination by A3G. Such predictions are important for 
investigations directed at investigating the origins of mu- 
tations that are associated with HIV-1 genetic variation 
(14). Given the high HIV-1 mutation rate (41), and the 
high rate of G-to-A transition mutations (42), there is 
intense interest in the origins of mutations that arise 
during HIV-1 replication. Knowledge on the origins 
of mutations during HIV-1 replication is important for 
developing a better understanding of HIV-1 genetic vari- 
ation and evolution as well as for efforts to purposely 
elevate HIV-1 mutation to induce lethal mutagenesis 
(43,44). 
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